perf(server): SIGUSR2 writes V8 heap snapshot#768
Conversation
Adds a `process.on('SIGUSR2', ...)` handler that calls
`v8.writeHeapSnapshot('/tmp/lobu-<pid>-<ts>.heapsnapshot')`. Lets us
profile the leak that survived lobu#767: post-fix the app pod still
grows ~3 MB/30s toward the 1Gi limit even with the queue healthy.
Usage:
POD=$(kubectl get pod -n summaries-prod \
-l app.kubernetes.io/component=api -o name | head -1)
kubectl exec -n summaries-prod $POD -- kill -USR2 1
# wait for "snapshot written" log line, then copy out:
kubectl cp summaries-prod/$(basename $POD):/tmp/<file>.heapsnapshot \
./lobu.heapsnapshot
# open in Chrome DevTools → Memory → Load
Notes:
* writeHeapSnapshot is synchronous and blocks the event loop for several
seconds proportional to heap size — only trigger manually when
investigating, never wire to an automated source.
* SIGUSR2 is free on Node (SIGUSR1 is the one reserved for the
inspector).
* Snapshot goes to /tmp which is the container's writable tmpfs.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds an environment-gated SIGUSR2 handler (when ALLOW_HEAP_SNAPSHOT=1) that imports Node's ChangesHeap Snapshot Signal Handler
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Three findings from pi on PR #768; all addressed: 1. **Secrets in snapshots** — gate the SIGUSR2 handler behind ALLOW_HEAP_SNAPSHOT=1. Default off in prod. Operator must explicitly opt the pod in, capture, then unset and roll. Workers run under the same UID (Dockerfile sets no separate USER), so on-disk snapshots aren't isolated from a same-UID exec path. 2. **No rate limit / cleanup** — single-flight via an in-progress flag; subsequent SIGUSR2s during a write are dropped with a log line. Use a single rolling path /tmp/lobu.heapsnapshot so a stuck-on flag can't fill the writable layer. 3. **Probe interaction** — documented in the handler comment: trigger needs cgroup-limit headroom (writeHeapSnapshot allocates ~heap size while running) and blocks /health/ready (DB SELECT 1). Caller-side; nothing programmatic to fix without an already-multi-replica deploy.
pi review — addressedThree findings, all in 4614db3:
|
Why
Post-incident measurement (see lobu#767): with the queue healthy again, the app pod still grows from baseline toward the 1Gi limit. Without an inspector port we can't see what's allocating.
Sample taken from `summaries-app-lobu-app-77756ccdd7-dkh2l`:
That's ~3 MB / 30s of slow growth. Pre-fix the same pod hit 1Gi in 70 min from the same baseline (driven by the schema-mismatch error pile-up). Post-fix it's slower but not zero — there's a residual leak.
What
`process.on('SIGUSR2', () => v8.writeHeapSnapshot('/tmp/...'))` so we can dump on demand:
```
POD=$(kubectl get pod -n summaries-prod -l app.kubernetes.io/component=api -o name | head -1)
kubectl exec -n summaries-prod $POD -- kill -USR2 1
wait for "snapshot written" log line, then:
kubectl cp summaries-prod/$(basename $POD):/tmp/.heapsnapshot ./lobu.heapsnapshot
open in Chrome DevTools → Memory → Load
```
Notes
Test plan
Summary by CodeRabbit